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Abstract 

Large sampling intervals can affect reconstruction of Kramers-Moyal coefficients 
from data. A new method, which is direct, non-stochastic and exact up to 
numerical accuracy, can estimate these finite-time effects. For the first time, 
exact finite-time effects are described analytically for special cases; biologically 
inspired numerical examples are also worked through numerically. The approach 
developed here will permit better evaluation of Langcvin or Fokker-Planck based 
models from data with large sampling intervals. It can also be used to predict 
the sampling intervals for which finite-time effects become significant. 

1 Introduction 

Kramers-Moyal analysis is a recently developed [1] method of characterising, 
from time series data, an arbitrarily nonlinear stochastic process X{t). The 
Kramers-Moyal coefficients are defined as 

D^"Hxo,t) = hm l^{[x{t + T)-xor\x{t)=xo). (1) 

The smallest r available in an experiment is the sampling interval, but the 
definition requires an infinitesimally small r. If the sampling interval is not 
'small enough', the coefficients measured from experimental data may therefore 
not match those arising from an analytically constructed continuous model. This 
article presents, for the first time, a non-stochastic method of exactly predicting 
these finite sampling interval effects, which may be abbreviated to 'finite-time 
effects' throughout this article. 

The first two Kramers-Moyal coefficients appear in Fokker-Planck equations 
for transition probability density, 

^^M^ = %)P(:.,i|x',0, (2) 
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where the Fokker-Planck operator is 



L{x) 



d_ 

dx 




These first two coefficients are known as the drift and diffusion coefficients re- 
spectively, and are analogous to effective position-dependent force and noise, or 
'temperature', respectively [2]. The drift and diffusion coefficients, then, provide 
an intuitive description of the process. If the process is also Markovian, they 
provide a complete characterisation; furthermore, all higher coefficients will be 
zero [3]. Unlike many other methods of time series analysis, the Kramers-Moyal 
approach recovers coefficients with arbitrary nonlinearity, up to the resolution 
of the bin size chosen in the reconstruction. 

The Kramers-Moyal approach has previously been applied to neuroscience 
[U , cardiology [4] , traffic engineering [2] , finance [6] and turbulence [7] . These 
applications frequently assume homogeneity, in the sense that the Kramers- 
Moyal coefficients are time- invariant, so that the ensemble average in Eq. ([T]) 
can be replaced by a time average. We will also assume homogeneity throughout 
this article. 

The burgeoning collections of data available with recent advances in quanti- 
tative biology also beckons use of such methods, for biological systems are often 
well- modelled by an overdamped and Brownian environment. For example, one 
phase in the 'walking' of naturally occurring molecular motors such as myosin- 
V and kinesin is beheved to be a tethered diffusion state, where one 'head' of 
the bipedal motor is unbound and searching for the next binding site O [9l [10] . 
We will use this system to illustrate the results we obtain. The unbound head 
can be modelled as undergoing diffusion on a sphere with center on the neck 
juncture and with radius equal to the length of the tether. We refer to this case 
as 'free diffusion on a sphere'. However, the neck juncture itself is not fixed, for 
its tether is flexible about its preferred orientation jHj. We model the motion 
of the juncture as diffusion on a sphere with a potential: 'biased diffusion on 
a sphere'. Combining these models, a more precise treatment of the unbound 
head would have it undergoing free diffusion on a sphere about a point which is 
itself undergoing biased diffusion on a sphere: 'compound diffusion'. In all cases 
we assume the experiment can only measure in one linear dimension, inclined at 
angle from the direction of potential minimum (where there is a potential) . 
These are summarised in Fig. [1] 

Ragwitz and Kantz [12j were the first to consider finite sampling interval 
effects in the estimation of Kramers-Moyal coefficients. They calculated a par- 
tial [13] approximation to such effects to first order in the sampling interval. 
An exchange with Friedrich et al. ensued [13l[14], with Freidrich et al. provid- 
ing an infinite series expansion for the finite-time correction. Later, Kleinhans 
et al. [15] described a numerical method for recovering the continuous-time 
Kramers-Moyal coefficients from finite-time estimates. From initial continuous- 
time guesses, their method computed the resulting finite-time estimates by di- 
rect numerical simulation of the Langevin equation corresponding to the Fokker- 
Planck equation ((2|). They update the guesses based on the known finite-time 
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Figure 1: Schematic of the biologically- inspired examples used in this article: 
(a) free diffusion on a sphere, or 'tethered diffusion'; (b) biased diffusion on a 
sphere; and (c) compound diffusion on a sphere. 

measurements. Our method avoids the need for stochastic Langevin simulations. 

In Section 2 we review Friedrich et al. 's infinite series formula, then proceed 
to our main result. Section 3 calculates the finite-time effects for some analytical 
special cases, while section 4 provides some numerical examples. These examples 
are chosen for their relevance to possible biophysical uses of Kramers-Moyal 
analysis. Points of a general nature are made in section 5, and concluding 
remarks in section 6. 

2 Derivation 

We denote the Kramers-Moyal coefficients ([T} calculated with finite sampling 
interval r by 

Di^Hxo) = -^([x{t + r)- xor\xit) = xo). (3) 
This can also be written as 

1 

(a;o) = _ / {x- xoTPix, to + t\xo, to)dx, 

which since we assume homogeneity is independent of to- Substituting the 
formal solution to the Fokker-Planck equation ^ 

P{x, t + T\xo,t) = e-^(^)^(5(.T - 2:0), 
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we obtain 

D(")(a;o) = — 7 / {x- XoTe^^'='>^S{x - xO)dx (4) 

™' J-oo 

= e^'(^)^(a;-a;orU=.o. (5) 

Here 

is the adjoint of the operator L over the inner product (g, ft,) = J_ao g{x)h{x)dx. 
Friedrich et al. [13] expand the symbohc exponential, and show that the first- 
order terms lead to the first-order correction of Ragwitz and Kantz [12j . 

Equation ([5]) has an alternative interpretation, by a route Zwanzig [16] calls 
the 'Heisenberg approach' in analogy to the complementary formulations of 
quantum mechanics. This interpretation is that the solution of the partial dif- 
ferential equation 

'-^^L^i.)Wix,t) (6) 

at (xo,t), with initial condition W{x,Q) = {x — xo)"/n!T, gives 

Therefore, knowing the continuous-time D^^\x) and D'^'^\x) we can predict 
the effect of finite sampling interval without direct simulation of the Langevin 
equations. This is the main result of this article. We will refer to Eq. ^ as 
the adjoint Fokker-Planck equation; note it is different to, and should not be 
confused with, the Kolmolgorov backward equation [3] . 

One other obvious non-stochastic approach is to compute Eq. but this 
involves a Dirac-delta initial condition. For some special cases, this initial condi- 
tion could be treated analytically. Indeed, this is the method by which Ragwitz 
and Kantz |12j obtained their approximate corrections. In general, however, 
where numerical methods are required, the Dirac-delta initial condition would 
present significant difficulties. 



3 Analytic special cases 



In some cases we can solve the adjoint Fokker-Planck equation ([6|) analytically. 
One of the simplest scenarios is a linear drift D'^^\x) = —kx, corresponding 
to a quadratic potential well, and a constant diffusion D^^''{x) ~ D. This is 
the Ornstein-Uhlenbeck process in one dimension [3]. With initial condition 
W{x, Q) = X — Xq, the solution is W{x, r) = xe~^'^ — Xq, so the finite-time drift 
coefficient measured at finite sampling interval r is 



-fcT\ 



(7) 



With initial condition W{x^Q) = {x — xq)^, the solution is VF(x,t) = [xe 



(1 



). Therefore the finite-time diffusion coefficient is 



D?\x) 



1 

27 



x'il 



-2fcT^ 



(8) 
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As T increases, the finite-time drift coefficient for the 1-D Ornstein-Uhlenbeck 
process remains hnear but decreases in gradient, while a quadratic component 
appears in the diffusion. This causes the diffusion estimate to be larger at the 
edges of a potential well than at the center, an effect first predicted by Ragwitz 
and Kantz [T^ . 

Free diffusion on the surface of a sphere of radius r, as sketched in Fig. HJ^a), 
can be represented in polar co-ordinates {9, (p) by the Langevin equations [17j 

de = D cot edt 



written in the Ito convention. Projected onto a rectangular co-ordinate x 
rcosO, these reduce to the Langevin equation [18] 




dx = ~2Dxdt - y/2D{r'^ - x^)dw, (9) 

corresponding to drift and diffusion coefficients D^^''{x) = —2Dx and D^^\x) = 
D{r^ — x^). With initial condition W{x, 0) ~ x~ Xq, the solution to the adjoint 
Fokker-Planck equation © is W{x,t) — xe^'^^'^ — xq, so the finite-time drift 
coefficient is 

I?«(.t) =--(l-e-2^-). (10) 

T 

With initial condition W{x, 0) = (x — xq)^, the solution is W{x, t) = x'^e~^^'^ — 
2xQxe~^^'^ + Xq — r"^ (e~^^^ — l) /3. Therefore the finite-time diffusion coeffi- 
cient is 

D(^) (x) = ^ [x^ (1 + - 2e-2^-) + (l - e-^^^)] . (11) 

For free diffusion on a sphere, the gradient of the drift coefficient pil)) again 
simply decreases with t. Notice that the diffusion (|lip . however, undergoes a 
much more qualitatively significant change: it changes from convex to concave 
as T increases. This change occurs at r = ln(l/2 -t- \/5/2)/2D w 1/4_D. 

As a general rule of thumb, we see that if the sampling interval is of order 
l/fc (for Ornstein-Uhlenbeck) or 1/D (free diffusion on a sphere), or larger, 
we should be concerned about finite sampling interval effects. Also, a concave 
diffusion is an indicator there may be finite-time effects. In both examples the 
original D^^-' and D'-^-' are recovered in the limit r ^ 0. 



4 Numerics 

The finite-time D'i!'\xo) may also be computed numerically. For each value 
Di"\xo) that is required, the adjoint Fokker-Planck equation ^ with initial 
condition W{x, 0) = {x — xq)^ /nW must be solved up to time t = t. 
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Biased diffusion on a sphere [Fig. [TJb)] can be modelled by the Langevin 
equations, in polar co-ordinates (0, </>), by [17j 



dO^ {-deU + D cot 9)dt 

d(b = — ^"^f^ dt + ^ ^ dwch- 
sin^ 9 sm9 ^ 

We use a harmonic potential U = k9'^/2; ioi k — we recover the free diffu- 
sion on a sphere of the previous section. We projected the system onto one- 
dimensional continuous-time Kramers-Moyal coefficients by the method of Lade 
and Kivshar [18]. This co-ordinate was the rectangular co-ordinate at angle 
9o from the preferred angle of orientation {9 = 0), that is, x = cos cos 6* -|- 
sin ^0 sin 6* cos 0. We then estimated the finite-time Di^\x) and d'^\x) for a 
variety of r by numerically solving the adjoint Fokker-Planck equation with a 
simple forward-time centred-space scheme and extrapolated boundary condi- 
tions. For comparison, we also computed the drift and diffusion coefficients, as 
per Eq. ([3]), from direct simulation of the Langevin equations (fT2|) . The results 
are shown in Fig. [21 

We observe the same decrease in gradient of the drift as in the Ornstein- 
Uhlenbeck process. Since the drift is roughly displacement divided by sampling 
interval r, and since the displacement of the projection x between measure- 
ments is limited (by the radius of the sphere) , it makes sense that the estimated 
drift decreases with increasing r. Unlike the Ornstein-Uhlenbeck process, there 
is some curvature due to geometrical effects. The zero crossing point reflects 
the position of the potential minimum at x = cos • With the diffusion, we 
observe a loss of features and convergence to an upright quadratic shape as in 
the Ornstein-Ulhenbeck process. As observed by Ragwitz and Kantz 12J, this 
shape is because at large r, nonzero drift can lead to overestimation of diffusion, 
and the drift is larger at large position x. In both drift and diffusion curves, 
there are errors in the direct numerical estimates due to the singularities in the 
Langevin equation a,t 9 — and tt. 

As a second example, we simulate compound diffusion on a sphere [Fig. 
[TJc)], where a point undergoes free diffusion on a sphere ^ with respect to a 
point that is itself undergoing biased diffusion on a sphere (fT2)) . Results are in 
Fig. El 

Here we observe some interesting geometric effects |18j in the continuous- 
time (r 0) drift and diffusion coefficients but which become 'washed out' 
once again to linear drift and upright quadratic diffusion for large t. 



5 Discussion 

The characteristic quadratic shape we have obtained for the diffusion coefficient 
at large sampling intervals is clearly observable in Farapour et al. [TS] and 
Ghasemi et al. [6], and was used by them in fitting the results of their analy- 
ses. If we assumed their process was actually Ornstein-Uhlenbeck, using their 
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Figure 2: Finite-time (top) drift and (bottom) diffusion coefficients for biased 
diffusion on a sphere, predicted semi-analytically from the method of section [2] 
(sohd Hues) and by direct numerical simulation (markers). The parameters were 
fc = 1, 00 = 7r/4, D = 1 and samphng interval r ^ (circles), t — 1 (triangles) 
and T = 5 (squares) . The direct numerical results were not plotted where there 
were insufficient data. 
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Figure 3: Finite-time (top) drift and (bottom) diffusion coefficients for com- 
pound diffusion on a sphere, predicted semi-analytically from the method of 
section [2] (sohd hues) and by direct numerical simulation (markers) . The pa- 
rameters were A: = 1, 6*0 = 7r/4 (for the biased diffusion), D — 1 (for both parts 
of the compound) and sampling interval r ^ (circles), t = 1 (triangles) and 
T = 5 (squares). The direct numerical results were not plotted where there were 
insufficient data. 
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sampling intervals and the analytic solutions ([TUB]) we could solve for k and D 
in each case, and found a good match with their numerical drift and diffusion 
curves. Note that, given r, this problem is overdetermined: the functional forms 
leave three simultaneous equations from which to determine k and D. Thus the 
processes investigated by those authors (specfically, the logarithmic increments) 
are well modelled in continuous time by the above simple Ornstein-Uhlenbeck 
process. 

The drift and diffusion coefficients observed in some cascade analyses of time 
series [501 HIl IS] also display patterns similar to the above, with the scale size 
standing in for sampling interval. For Jafari et al. [21| the trends match almost 
exactly with k = 0.0055, D = 2.9 x 10"'* and t = Ax/8 with length scale Ax 
given as in the legends to their figure 2. For Kimiagar et al. [2^ the directions 
of the trends do not match as precisely, while Friedrich et al. [5] only graph for 
one T. In these two cases the finite-time drift and diffusion coefficients seem to 
match those expected for the Ornstein-Uhlenbeck model above, but there is no 
explicable mapping from length scale Ax onto r. 

Several of these authors [3T1 I^Dl [S] then perform Langevin simulations with 
their numerical drift and diffusion coefficients, which we have shown to be suf- 
fering from finite-time effects. But in all cases they observe excellent agreement 
with the reference time series. We claim this is because the finite-time drift and 
diffusion coefficients in general provide for better simulation of the time series 
if the same time step as the original sampling interval is used, a subtlety these 
authors overlook. For example, in the simple Euler-Maryuama [22] algorithm, 
oi^^ by construction provides an unbiased estimate for x{t + t) from xit). 

We have assumed that the evolution of the measurand can be fully described 
by the Fokker-Planck equation ([2]) and the drift and diffusion coefficients. This 
is valid if the process is Markovian. Projections of a larger-dimensional system, 
such as those used in section 4, are in general not Markov, even when the original 
system is [16]. However, it has been shown that the projections of section |4] are 
approximately Markovian [18j . and the excellent agreement with direct numer- 
ical simulations in that section confirms that the method worked successfully. 
Also, the method could easily be extended to other evolution operators L than 
the Fokker-Planck operator, in the event that this operator is not appropriate. 

In principle, the numerical method outlined above could be inverted iter- 
atively, preferably with more sophisticated methods of numerical solution, to 
find continuous drift and diffusion coefficients from experimentally measured 
finite-time coefficients. The only published method [Ej to solve this 'inverse 
problem' is also numerical, but relies instead on direct simulation of the esti- 
mated system's Langevin equations of motion, which being stochastic must be 
repeated over many trajectories to acheive good statistics. 

Even in the absence of such modifications, the results above provide worth- 
while contributions to time series analysis by Kramers-Moyal coefficients. For 
the first time without direct integration of the Langevin equation of motion, 
we have provided an exact method with which to predict the effects of finite 
sampling interval on the estimation of Kramers-Moyal coefficients. Our method 
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contains no stochastic terms, being based on Fokker-Planck equations, and re- 
quires no ensemble (or time) averaging. 

Given a known model for a system, our approach can be used to predict the 
effect of finite experimental sampling intervals on the estimation of Kramers- 
Moyal coefficients. Experimental observations of the Kramers- Moyal coefficients 
could then be used to support or reject the model. 

Given a time series, a quick test to see whether finite-time effects may occur 
could be to check whether the sampling interval approaches any characteristic 
periods of the system. This would occur if the differences in the time series 
over a single time step are 'large' compared to the full extent of the time series' 
fluctuations. 

6 Conclusions 

An alternative interpretation of an existing formula ([5]) was proposed, which 
permits exact prediction of finite sampling interval effects on the estimation 
of Kramers-Moyal coefficients. Special analytical cases were presented, which 
showed general features of finite sampling interval effects. The approach was 
also implemented numerically, for examples of particular relevance to biophysics. 
Previously published Kramers-Moyal analyses showed features possibly explica- 
ble as finite-time effects, in particular, concave diffusion curves. The work can 
permit better evaluation of Langevin or Fokker-Planck based models with data 
that has large sampling intervals, or to predict the sampling intervals for which 
finite-time effects should become significant. 
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